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Bialek, Callan and Strong have recently given a solution of the problem of determining a con¬ 
tinuous probability distribution from a finite set of experimental measurements by formulating it as 
a one-dimensional quantum field theory. This report applies an extension of their formalism to the 
inference of functional parton distributions from scattering data. 

The parton model is applicable to large momentum transfer hadronic processes. Scaling in hadronic cross-sections is 
interpreted in the parton model as a consequence of the existence of charged pointlike constituents in hadrons. Parton 
model cross-sections are calculated by combining amplitudes for the scattering of these constituents with probability 
densities (parton distribution functions) for finding a given constituent in a hadron carrying a given fraction of the 
total momentum of the hadron. 

In perturbative QCD scattering cross-sections can be given a parton model-inspired form, by invoking a factor¬ 
ization scale, /r/. Roughly, it is supposed that all internal lines with momenta off-shell by more than pif are included 
in the amplitude factor of the cross-section, and the effects of all lines with softer momenta are included in the 
parton distributions. (The precise meaning of /if is incorporated in the operational definition of parton distribu¬ 
tion functions via factorization schemes, and will not be needed here.) The latter quantities cannot, at present, be 
derived from QCD, and must therefore be experimentally determined. There is an important theoretical constraint 
on these distributions: No physical quantity can depend on the choice of factorization scale, thus the calculable [if 
dependence of the amplitude factors can be translated into the fif dependence of the distribution functions, leading 
to the Gribov-Lipatov-Altarelli-Parisi (GLAP) evolution equations satisfied by the parton distribution functions. 
Knowledge of the parton distribution functions at some value of [if therefore is input for the prediction of hadron 
scattering cross-sections at other energies. Thus the importance of inferring these parton distribution functions cannot 
be over-emphasized—the prospect of extracting new physics from hadron collider experiments requires the subtraction 
of known hadronic physics to a high degree of sprecision. 

Besides systematic experimental uncertainties, the determination of parton distribution functions is a difficult task 
because the fitting procedure is of necessity somewhat circuitous [Q . Given the parton distribution functions at some 
momentum transfer, they must be evolved to other values using the GLAP equations. The experimental data is then fit 
to cross-sections computed from the set of evolved parton distribution functions. Then the initial parton distribution 
functions are altered to improve the experimental fit, and the whole set of steps is repeated. Furthermore, the 
functional form of the parton distributions is unknown—we do not know how to solve QCD. Thus a fit to parameters 
in a postulated functional form is beset with the worry that the true functional form of the distribution is not the 
postulated one, in which case it is quite probable that the true distribution does not even lie on the manifold of 
parameters at all. That this is not a far-fetched scenario is pointed out by the In x dependence that appears to 
smooth the [if dependence of a, b in the original x a {l — x) b form (lj. 

The aim of this report is to show that a recent formulation of the problem of functional statistical inference due 
to Bialek, Callan and Strong [|| can be simply extended to provide a framework for parametrization independent 
inference of parton distribution functions. 

My attention was drawn to this problem by G. Sterman who emphasized the importance of a parametrization- 
independent fit. Further, after this work was completed, and presented at several seminars, M. Peskin drew my 
attention to the work of Giele and Keller These authors point out the utility of Bayesian methods in the inference 
of parton distributions jdj], motivated by the problem of assigning error estimates to inferred parton distribution 
functions. It seems therefore that the present report contains material that complements |i|j, and fills a gap mentioned 
in the concluding section of |J. 

I review briefly the elegant formulation of the problem of inferring a continuous probability distribution from a finite 
number of data points given by Bialek, Callan and Strong ^j. A small variation on their work, to take into account 
the fact that probability distributions are densities, was given in ||,|]|, but I shall not bother with such niceties here. 
I then extend the work of |Q by constructing an exact solution of the equations obtained in j^] for a finite data set, 
instead of the WKB type analysis given in [3]. With this material in hand, the parton distribution function problem 
involves just a few conceptual extensions. 

Ref. [[| used Bayes’ rule to write the probability of the probability distribution Q, given the data {aq}, as 

( , ,, P[x 1 ,x 2 ,...,x n \Q(x)\P[Q(x)\ _ Q(x 1 )Q(x 2 )---Q(x n )P[Q(x)\ 

Xi,x 2 , ,X N P(xi,X 2 , -;Xn) J[dQ(x)]Q(xi)Q(x 2 ) ■ ■ ■ Q{x N )P[Q(x )]' 

where the factors Q{xi) arise because each Xi is chosen independently from the distribution Q(x), and P[Q] encodes 
the a priori hypotheses about the form of Q. The optimal least-square estimate of Q, Q es t(x, {a^}), is then 
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where (• • •}^°' ) denotes an expectation value with respect to the a priori distribution P[Q(x)\. 

In this field-theoretic setting, Bialek et al. J|] assumed that the prior distribution P[Q\ should penalize large 
gradients, so written in terms of an unconstrained variable <fi = — In (£Q) £ (— 00 , + 00 ), they assumed 


Pe[(/)(x)\ = — exp 


- l - J dx{d x 4>f 
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with £ a parameter that they later averaged over independently. This form for the prior distribution is very simple, 
and quite minimal in terms of underlying assumptions, so conclusions drawn from it should be robust. 

Write Q{x) = exp(— (j>{x))/£, with 


P[4>\ = - exp 


- 7 ; J dx{d x (f)) 2 


1 - - / dxe-+ lx ) 
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Now, we want to evaluate 


(Q{xi)Q(x 2 ) ■ • •Q(.'Civ)) (C)) = J T></>P[(/>] exp[-</>(xi)] = J ^ J D</>exp [-£((/>; A)], 


(5) 


where 


= ^ J d x(d x cf>) 2 + J dxe <f>(xi) — iX. (6) 

Ref. Q showed that the integral in eq. || can be evaluated by saddle-points for N large, and showed that the best 
estimate for the inferred distribution tends to the true distribution. The width on the space of distributions (in other 
words, error bars about the distribution) can also be computed in this formulation. 

Since I will use a slightly different analysis for the parton distribution function problem, adapted from jf| (see 
eq.’s below), I first recast their analysis |^] in the framework I need. Finding the configuration which 

extremizes the classical action, eq. 0, is equivalent to maximum likelihood estimation, which chooses the distribution, 
Q(x), that maximizes P[Q(a;)|{a;i}J7 The classical equations of motion for (j) and A are 
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£d 2 Mx) + = Y <*(* ~ *0 


i= 1 

dxe~ Mx) = 1. 


Eq. [i] and eq. || imply *A c i = N, provided that d<j){x) vanishes as \x\ —► 00 . 
It is easy to solve eq. [?]. Notice that the equation 

d 2 J[x ) + 7Ve- /(x) = 0 


(7) 

( 8 ) 


(9) 


is solved by 


/(*;-,>) = 21n( Sillh( ° ( ;- ! ’)) )-l n | F . (10) 

Assume first that the data points are all distinct, and let £ = 1 for the time being—it is easily restored by dimensional 
analysis. We order the data points { 2 ^} in ascending order x-i < 2 ^+ 1 . On the interval [xi- 1 , Xi ], with 2:0 = — 00 , Xn+i = 
+ 00 , we take the solution <j> c \ = f(x] aj_i, 6j-i). ag = 0 = ajv is implied by the asymptotics above. Remark that 
/ = ln( 2 ; — b ) 2 — In A is the limiting case when a j 0. At Xi, i = 1,. .., N we impose the matching conditions 

sinh(aj_i(a;i - b»-i)) _ sinh(qj(xi - h)) 

CLi—\ 
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which makes 4> c \ continuous, and 


sinh(a i _i(x i - ^_i)) 

2-= cosh(ai(xj - bi)) - cosh(ai_i(a;i - b z - 1)) (12) 

O-i -1 

which accounts for the delta functions at xt. There are 2TV matching conditions, and there are 2N free parameters, 
since there are N + 1 intervals, but ao and a./v have been fixed by the asymptotics already. If data points coincide, 
the number of intervals is appropriately reduced, and the right hand side of [L3 is multiplied by the number of data 
points at that value of x t . Thus we have an explicit exact solution of the variational equations valid for arbitrary N. 
The parton distribution function problem will require assuming that the probability distribution has finite support 
on the unit interval, but the changes required in these formulas for this case are rather minimal, and are explained in 

§!• n „ 

What we really want to do, of course, is to compute eq. As in |2j], but with the exact solution (j) c i(x; {xj}) 

constructed above, the one-loop contribution is 

Sl = l{j) / J dxe ~ kMx) ’ ( 13 ) 

up to terms that are lower order in powers of N. (Some care is needed in taking into account the fluctuations in A, 
since these mix with constant shifts in <f >, but this is not difficult.) So finally, we note that 
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up to higher loop corrections. While d x (f> c \ is not continuous, it is square-integrable, so this expression is well-defined. 
The last step in the problem is to determine i. This is explained in Q from a ‘global’ perspective, with i constant 
over the whole support of the distribution, and a variant is explained in j|j5| from a local perspective. 

Turning now to the parton distribution function problem, there are two new complications beyond the evaluation 
of eq. ||. As mentioned above, parton distribution functions are inputs into the determination of cross-sections. 
These cross-sections, in turn, are the quantities that are experimentally measured. Thus the problem of inferring 
parton distribution functions is a Bayesian problem with a different prior distribution compared to the simple direct 
inference problem addressed in Q. The prior distribution favors continuous parton distribution functions, but the 
actual probabilities (in other words, the cross-sections) are (in general) nonlinear functions of the parton distribution 
functions. For example, for the Drell-Yan process 

vab{p,p') = / / AxAx'a i:j {xp,xp',p- 1 p f )(j) i \ A (x,p]pf)(l) j \ B {x'.p-^f) ( 15 ) 
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is the cross-section, with 4>j\B the probability of finding a parton of type j carrying fraction x of the total momentum 
in a hadron B , and &ij(xp,xp ', p\ p,f) is the cross-section for scattering partons of type i,j. Here the dependence 
on the factorization scale and the normalization scale have been indicated explicitly. While the dependence of the 
cross-section on the parton distribution functions is nonlinear in this example, there are also examples where the 
dependence is linear, for example, fragmentation cross-sections 


da c (l) = ^ 




da; Aa. l {l/x,p,- 1 p f )D C \ i {x,ii-, p f ), 


( 16 ) 


where -Dc|* is the fragmentation function for parton i to produce a hadron C with momentum l. Furthermore, the 
parton distribution functions satisfy the GLAP equations, which are linear equations. Thus we do not integrate 
over parton distribution functions at all values of /a/, but rather at a chosen value, say po, with parton distribution 
functions at other values determined by the GLAP equations 




[x/^asip 2 )}^\ A {i,p\p f ). 


( 17 ) 


These equations are a consequence of factorization and the fact that physics is independent of the choice of factorization 
scale. 


3 



This is an outline of the situation we wish to study. Let us abstract from these facts the problem we need to solve. 
We want to determine the values of some continuous Helds fa(x,t), given that these fields determine some families 
of probability distributions Q a [(f>](t). We are given a finite set of experimental data {Q ai (ti)}, with the ay the type 
of cross-section observed for data point /, and tj the momentum transfer for this data point. We are normalizing 
cross-sections as appropriate for probability distributions. If the GLAP equations were unknown, we would have to 
determine fa(x,t) for all values of t , and the problem would involve a two-dimensional field theory as opposed to the 
quantum mechanics problem, eq. |s]. As it happens, the GLAP equations are exact to all orders in perturbation theory 
[jl]. We need only determine fa(x,to ) at a given value of t 0 - We can write the solution of eq. |l7| as 

fa(x,t) = / dx'K i j(x,x';t,t 0 )(l>j(x , ,to). (18) 

Jo 

In this abstract formulation, the fa need not be positive or normalized in any sense. For the actual physical problem, 
the (j)i themselves are probability distributions. This is the point where the choice of a Bayesian prior affects the 
width of the distribution on the space of distributions, in other words, affects the error bars around the inferred parton 
distribution functions. The choice of prior also affects the convergence to the ‘true’ parton distribution functions as 
the sample size becomes larger. To be concrete, we observe that eq. 0 is naturally expressed in terms of <j>, whereas for 
calculation in the functional integral we are formulating, the natural variable to use is In (j), which is not constrained 
to be positive definite. Assigning a priori probabilities that favour continuity of <fi leads to a measure in the functional 
integral weighted by J (d x fa) 2 , whereas favoring the continuity of huj) leads to f fa^ 2 (d x fa) 2 . There seems to be no way 
to decide on the appropriate Bayesian prior without actual data analysis with different priors—this caveat applies to 
practically all Bayesian statistical inference. 

To be explicit, I consider each prior in turn. For either choice, our problem is to evaluate 



dxfa(x,to) - 1 



Q ai {ti) x exp(-S'o), 


(19) 


with So = 5 / dxY^j(dx4>j{x,to)) 2 with the measure d/i = D<J>i(to), and the restriction fa > 0 , or So = 
i / d xY^jifiJ^^dxfijixfao)) 2 w ith the measure dfi = Dln^j(t 0 )- Eq. |l9] is obviously of the same variety as eq. [|. 
There is no conceptual or computational problem in incorporating other priors if the physics warrants such changes. 
The variational equation determining the maximum likelihood distributions is 


9 x fa(x, to) - iX j = ~Y ] f dx’ Qa ; K kj ( Xx; fa 

yyJ o 6(t> k {x’fai) 


to) 


for the former choice of So, and 


dxfa 2 d x fa(x, t 0 ) - i\j = - Y] [ dx' ^ Q , a ; ^ K k j (x\ x; fa, t 0 ) 

yyj o o(pk{x ,ti) 


( 20 ) 


( 21 ) 


for the latter choice. Here A j are Lagrange multipliers, which ensure that f fadx 
terms of ipj = — In 4>j : 


1. It is simpler to write eq. 21 


dli/)j(x,t 0 ) + iXj exp (-tpj) 



, 5 In Q ai (fa) 
X W k (x',fa) 


K kj (x , ,x;t I ,t 0 ) 


( 22 ) 


since we do not have to constrain (/>/- to be positive, unlike 4> k - These are the general forms appropriate for obtaining 
parton distribution functions from a global fit to all available data, including different physical processes in one fell 
swoop. Obviously one can restrict the data points included to be e.g. deep inelastic scattering events alone, or 
Drell-Yan events alone and compare parton distribution functions so obtained Q. 

Of course, this global fit may not be desirable in all instances since it is (usually) inappropriate to combine data-sets 
with different systematic errors, but it may be helpful to have a formula available that explicitly implements a global 
fit, perhaps as a check. Further, I cannot give an analytical solution to eq. ^2] analogous to eq. |lTi|. This is not terribly 
disturbing since numerical methods are required for solving the matching equations (eq.’s | 11 | , | 12 | ) in any event. I do 
not discuss error estimations in depth, since they were the focus of the analysis in ||. Suffice it to say that in Bayesian 
approaches, what one calculates is the probability that a given set of parton distribution functions is the true set. 
The width of the distribution on the space of parton distribution functions can be extracted without much effort by 
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computing the effective action corresponding to the functional integral in eq. not just the effective action at a 
stationary point. 

An interesting possibility that one could also address in this framework is the following: The GLAP equations are 
valid to all orders in perturbation theory. One might wonder if there is any experimental evidence for non-perturbative 
deviations from the GLAP equations. This is easily incorporated in the present framework in several possible ways. 
Basically, instead of imposing the GLAP equations by hand, we need to integrate over all fa independently at all 
values of t. and impose an additional term in the action \ J dtdxJ2j(c/)J 1 d x fa(x,t)) 2 of the general form 



where GLAP ij(x / x '; t) stands for the operator that appears on the right hand side of eq. HI Such a term implements 
a competition between smoothness and satisfying eq. 0 in the Bayesian sense. 

There are various aspects of the suggestion in this report that need to be carefully considered before the suggested 
framework will be of any value. The problem of determining parton distribution functions is not a theorist’s concoction, 
but a practical one. As such, a great advantage is that the proposed scheme can be tested with real data, and rejected 
or accepted accordingly. 

I am grateful to George Sterman for his suggestion, to Curt Callan for encouraging comments, and to Michael 
Peskin for drawing my attention to ||. This work was supported in part by NSF grant PHY96-00258. 
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